Reading Data

EDA

Note: Notice how imbalanced is our original dataset! Most of the transactions are non-fraud. If we use this dataframe as the base for our predictive models and analysis we might get a lot of errors and our algorithms will probably overfit since it will "assume" that most transactions are not fraud. But we don't want our model to assume, we want our model to detect patterns that give signs of fraud!

Graphical visualization

So this shows us that fraud occured in all most every range of money transacted and non fraud are only transacte less than 50000 INR.
Performed some data mining and feature engeering and formed some new features for more understanding of data.

Labelencoding

Spliting data

Models

LogisticRegression

DecisionTree

KNeighborsClassifier

Grid search --> for more confirms on our models

updated Decision Tree

Conclusion:

Recommendations to IndAvenue

ML model explained

Stratergies FastCustomer Checkout :

The above plot shows differences in other parts of the day and how critical their timeline is for frauds. Afternoon and Morning times are more likely for frauds to take place.